---
redirect_from:
  - /recipes/snapshots
---

# Implementing data snapshots

<InfoBox>

This functionality only works with data models written in JavaScript, not YAML.
For more information, check out the [Data Modeling Syntax][ref-modeling-syntax]
page.

</InfoBox>

## Use case

For a dataset that contains a sequence of changes to a property over time, we
want to be able to get the most recent state of said property at any given date.
In this recipe, we'll learn how to calculate snapshots of statuses at any given
date for a cube with `Product Id`, `Status`, and `Changed At` dimensions.

<InfoBox>

We can consider the status property to be a
[slowly changing dimension](https://en.wikipedia.org/wiki/Slowly_changing_dimension)
(SCD) of type 2. Modeling data with slowly changing dimensions is an essential
part of the data engineering skillset.

</InfoBox>

## Data modeling

Let's explore the `statuses` cube that contains data like this:

```json
[
  {
    "statuses.order_id": 1,
    "statuses.status": "shipped",
    "statuses.changed_at": "2019-01-19T00:00:00.000"
  },
  {
    "statuses.order_id": 1,
    "statuses.status": "processing",
    "statuses.changed_at": "2019-03-14T00:00:00.000"
  },
  {
    "statuses.order_id": 1,
    "statuses.status": "completed",
    "statuses.changed_at": "2019-01-25T00:00:00.000"
  },
  {
    "statuses.order_id": 2,
    "statuses.status": "processing",
    "statuses.changed_at": "2019-08-21T00:00:00.000"
  },
  {
    "statuses.order_id": 2,
    "statuses.status": "completed",
    "statuses.changed_at": "2019-04-13T00:00:00.000"
  },
  {
    "statuses.order_id": 2,
    "statuses.status": "shipped",
    "statuses.changed_at": "2019-03-18T00:00:00.000"
  }
]
```

We can see that statuses change occasionally. How do we count orders that
remained in the `shipped` status at a particular date?

First, we need to generate a range with all dates of interest, from the earliest
to the latest. Second, we need to join the dates with the statuses and leave
only the most recent statuses to date.

```javascript
cube(`status_snapshots`, {
  extends: statuses,

  sql: `
    -- Create a range from the earlist date to the latest date
    WITH range AS (
      SELECT date
      FROM GENERATE_SERIES(
        (SELECT MIN(changed_at) FROM ${statuses.sql()} AS statuses),
        (SELECT MAX(changed_at) FROM ${statuses.sql()} AS statuses),
        INTERVAL '1 DAY'
      ) AS date
    )

    -- Calculate snapshots for every date in the range
    SELECT range.date, statuses.*
    FROM range
    LEFT JOIN ${statuses.sql()} AS statuses
      ON range.date >= statuses.changed_at
      AND statuses.changed_at = (
        SELECT MAX(changed_at)
        FROM ${statuses.sql()} AS sub_statuses
        WHERE sub_statuses.order_id = statuses.order_id
      )
  `,

  dimensions: {
    date: {
      sql: `date`,
      type: `time`,
    },
  },
});
```

<InfoBox>

To generate a range of dates, here we use the
[`GENERATE_SERIES` function](https://www.postgresql.org/docs/9.1/functions-srf.html)
which is Postgres-specific. Other databases have similar functions, e.g.,
[`GENERATE_DATE_ARRAY`](https://cloud.google.com/bigquery/docs/reference/standard-sql/array_functions#generate_date_array)
in BigQuery.

</InfoBox>

Please note that it makes sense to make the `status_snapshots` cube
[extend](/reference/data-model/cube#extends) the original `statuses`
cube in order to reuse the dimension definitions. We only need to add a new
dimension that indicates the `date` of a snapshot. We're also referencing the
definition of the `statuses` cube with the
[`sql()` property](/reference/data-model/cube#sql).

## Query

To count orders that remained in the `shipped` status at a particular date, we
will send a query that selects a snapshot by this date and also filters by the
status:

```json
{
  "measures": ["status_snapshots.count"],
  "filters": [
    {
      "member": "status_snapshots.date",
      "operator": "equals",
      "values": ["2019-04-01"]
    },
    {
      "member": "status_snapshots.status",
      "operator": "equals",
      "values": ["shipped"]
    }
  ]
}
```

## Result

If we execute a couple of such queries for distinct dates, we'll spot the
change:

```json5
// Shipped as of April 1, 2019:
[
  {
    "status_snapshots.count": 16,
  }
];
```

```json5
// Shipped as of May 1, 2019:
[
  {
    "status_snapshots.count": 25,
  },
]
```

## Source code

Please feel free to check out the
[full source code](https://github.com/cube-js/cube/tree/master/examples/recipes/snapshots)
or run it with the `docker-compose up` command. You'll see the result, including
queried data, in the console.

[ref-modeling-syntax]: /product/data-modeling/syntax
